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Abstract 



The problem of controlling higher-order interactions in neural networks is 
addressed with techniques commonly applied in the cluster analysis of quan- 
tum many-particle systems. For multi-neuron synaptic weights chosen ac- 
cording to a straightforward extension of the standard Hebbian learning rule, 
we show that higher-order contributions to the stimulus felt by a given neuron 
can be readily evaluated via Polya's combinatoric group-theoretical approach 
or equivalently by exploiting a precise formal analogy with fermion diagram- 
matics. 
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In attempting to unravel the mechanisms of information processing and attendant adap- 
tive behavior in neurobiological systems, considerable attention is currently being directed 
to non-linear processing in dendritic trees and to the computational power that can be 
gained from multiplicative or higher-order interactions between neurons [0,0. This focus 
is supported by a large body of theoretical work demonstrating enhanced performance in 
artificial neural networks involving such higher-order or multi-neuron interactions, as ap- 
plied to a variety of information-processing tasks, most notably memory storage and recall 
H |13| . Introduction of higher-order couplings is accompanied, however, by the threat of a 
combinatoric explosion that may strongly inhibit analysis, evaluation, and optimization. In 
this note we expose some simple techniques based on group-theoretic symmetry arguments 
that serve, in some cases, to reduce the serverity of these problems and give access to the ad- 
vantages of higher-order networks for problem domains involving complex correlations. Our 
study is guided by interesting parallels with the diagrammatic analysis of fermion clusters 
in many-body physics. 

We consider the following simple but standard model of a higher-order neural network. 
The network consists of iV binary-output hard-threshold units (model neurons) % whose state 
variables <7j take the value +1 if the unit is active ("firing") and —1 if the unit is inactive 
("not firing"). Model neuron % receives inputs from exactly K{ other units of the network, 
with self interactions excluded so that 1 < Ki < N — 1. A given neuron updates its state 
on a discrete time grid according to the deterministic threshold rule 

<Ti(t+l) =sgn[hi(t)\ , i = l,...,N. (1) 

Here hi(t) is the net stimulus felt by the neuron at time t, coming from internal and external 
inputs but reduced by a threshold parameter. For our purposes it is immaterial whether 
sequential or parallel updating is imposed. The general higher-order synaptic structure of 
the network model is expressed in the assumed form 
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hi(t) = c i0 (t) + Cih + + £ %ii2 (*)°ia (*) + ••• + 

Ji<ia<— <iKj 

= C (i) + C 1 (i) + C 2 (i) + --- + CV i (i), (2) 

where the sums include only those 7£j neurons from which neuron i receives inputs. The first 
term represents any external input to neuron i (reduced by its threshold), while the second 
term is the usual one representing binary interactions, a simple linear sum of states of input 
neurons weighted by synaptic strengths cy r The higher-order terms in the expansion, for 
n > 2, represent "multiplicative" interactions in that they are linear combinations of the 
products of two or more input-neuron states. One also speaks of a "sum-of-products" form 
for such interactions. 

We observe that the general nth-order contribution, 

= Z C ijlj2—jn (T jl <7 j2 ' ' ' a jn 5 (3) 

h<h—<in 

representing the irreducible interaction of n neurons with neuron i, introduces (^fj = 
Ki\/n\(Ki — n)\ weight parameters. Accordingly, specification of the net stimulus (0) requires 
2 Kt parameters. The exponential explosion of parameters with increasing connectivity Ki 
has deterred widespread application of higher-order networks, in spite of their theoretical 
advantages. 

Indeed, complete optimization of a network of a network having all possible combina- 
tions of higher-order terms is patently impractical for sizable values of Ki typically needed 
in real-world applications. However, a restricted optimization problem has been attacked 
by retaining only a strongly reduced pattern-specific connectivity fT3]]l5|| , while otherwise 
implementing the extended Hebbian learning rule to be introduced below. A similar strategy 
based on a connection-pruning scheme adapted to the pattern domain has been employed 
to tame the combinatoric explosion of parameters in higher-order probabilistic perceptrons 

Of course, if the entire array of coefficients Qj U2 ...j n is specified at the outset, the explosive 



combinatoric optimization problem becomes moot. In this note we shall focus on the fully 
connected network in an important special case of "one-shot" learning in which it is feasible 
and straightforward to evaluate the general term C n of the series (0). In fact, by exploiting 
standard group-theoretic results, we are actually able to sum this series in the limit of 
asymptotically large connectivity [K^ — > oo, implying an infinitely large network). 

We consider the familiar task of storage and recall of p random patterns = 
{Si, S%, . . . , S2} in the firing activities of the neuronal units, where again Sj G { — 1, 1}. 
As is well known [§],[7|||, such patterns can be faithfully stored as fixed points of the dy- 
namics (HD of the network model to a capacity p = 0(N K ) (with K = minjiQ), if the 
weight parameters of the stimulus expression (|2|) are chosen according to an extension of the 
classical Hebbian learning rule to the presence of interactions of all orders up to Kf 



C ijlj2---jr, 



£Sf^...S£, n = l,. (4) 



The efficacy of memory storage is commonly analyzed in terms of the overlaps 



(*) = £ 5 W*) (5) 
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of the current network configuration {&i(t), 02 (i), • • • , criv(t)} with a given pattern S' M . When 
a relative-entropy cost function is adopted ||17|| , this specification can be shown to be optimal 



among the class of simple local learning rules (where "local" implies that changes of synaptic 
strength depend only on the states of the neurons interacting at the given synapse). 

The generic term (|3]) in the stimulus expansion ([|) is evaluated as follows. We first 
examine the modified nth-order contribution 

C n = £ c ih ... jn a h {t)...a jn {t) (6) 

h-3n 

to the net stimulus, which consists of K™ terms. This auxiliary quantity contains redundant 
terms of two kinds: (i) "diagonal" terms in which two or more of the indices ji,---,j n 
coincide and (ii) "symmetrical" terms differing only through a permutation of distinct labels 
ji, . . . ,j n , which may be combined into a single term by redefining the weight parameter 



Cj v ..j n as the sum of the weight parameters with permuted indices. The former terms are 
redundant because they already appear in lower-order contributions of the expansion (0). 
The latter terms lead to overcounting by a factor n\. 

Inserting the learning rule (^) into Eq. (||) and interchanging the order of the summations, 

we may write 

v 

Cn = £ c m-jn a ji ' " a jn — £ £ &i ■ ■ ■ Sj n a jl ...a jn 

31— jn jl—jnl*=l 

(7) 

The desired nth-order contribution C n and its modified counterpart C n are evidently related 
by 

C n = £ c iji...j n a 3i- a 3n = -jC n deb(6 jaj(l ) . (8) 
3\<-<in "■ 

The n x n determinant in (4.3) eliminates all "diagonal" terms with two or more indices 
coincident, while the statistical factor n! compensates for the overcounting of symmetrical 
terms. 

It is next convenient to define "generalized" overlaps 

= £PjVi(*)]° (9) 

3 

of the current network configuration with one of the prescribed patterns, a being a positive 
integer. Since Sj = a? = 1, the quantity m£(T) reduces to Ki for a even and to m M (i) 
for a odd. Appealing to direct evaluation of the right-hand side of Eq. ([$[) or Eq. (H) for 
n = 1 — 4, we establish the pattern of behavior for the higher orders: 

ci = E K] , (io) 

C a = ES?i[(m?) a -m£], ( n ) 

C 3 = E ^4tK) 3 - 3m« + 2m£] , (12) 
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and 

C 4 = £ 5f^[(m^) 4 - 6{m») 2 m% + 8mfm? + 3(m£) 2 - 6m£] . (13) 

It is seen that the generic term C n is built as a sum over all patterns of individual terms of 
the form 

1 n 
5f- 7 (av ia „)nKr, (14) 

where j(oci, . . . , a n ) is a statistical weight factor and the generalized overlaps mf enter with 
positive integral powers satisfying the partitioning condition 



^2lai = n. (15) 



i=i 



The statistical factor is found to obey the sum rules 



X^7(aij ■•■)««) = and ^ |7(«i, «n)| = n\ , (16) 
(a) (a) 

and can be constructed as 

7 (a lf a„) = n!/[n(-ir +1 (r<)a/!] ■ (17) 
2=1 

Thus, for arbitrary n, the contribution C n can be written explicitly as 

C„=££fPn(K, •••,<) (18) 
At=l 



where 



_ 1 - 

V n {m x , ...,m n ) = — X]ll7(ai) ■■•.«nW • (19) 
n ' (a) i=1 



The sum over a in definition fll9|) extends only over those n-dimensional vectors a = 
(ai,...,a n ) whose components satisfy the constraint (|T5"D. The quantity V n (rni, m n ) is 



identified as a generalized Polya polynomial [18] of the symmetric group S n , with the signs 



[— l) a;+1 of the corresponding cyclic permutations incorporated. 

For given n, the total number of solutions P(n) of condition QT5D can be determined by 



induction from the recurrence relation [21 



1 n 

P(n) = -Y,P(<l)P(n-q), (20) 

U 9=1 

in which the divisor function p(l) is the sum of the first powers of the divisors of q. For large 
n, P(n) behaves asymptotically as 

p ^=^t^- (21) 

Finally, the generating function of the Polya polynomials may be employed to calculate 
the sum of all individual n-order contributions, i.e. the net internal stimulus hi(t) of Eq. ([|), 
in limit of large connectivity Ki, which is equivalent to the thermodynamic limit. One finds 
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n=0 M = l 



}^ : m; 



i=i 1 



{Ki -> oo) . (22) 



While this is a beautiful formal result, practical neural network applications often work with 
a single fixed order or with a few low orders adapted to the complexity of the problem (see, 



e.g. Ref. HH). 

Combinatoric group-theoretical considerations reveal an interesting one-to-one corre- 
spondence between the nth-order contribution C n to the stimulus sum (^) and the sum 
of planar n-particle cluster diagrams for noninteracting particles obeying Fermi statistics. 
(Substitution of a permanent for the determinant in expression (§) would produce a one-to- 
one correspondence with the sum of Bose n-body cluster diagrams.) Each fermion cluster 
diagram is uniquely defined by an n-dimensional vector (ai, ...,a„) satisfying relation (115]) 
and specifying a partitioning of the n-particle cluster into sub-clusters correlated by ex- 
change, namely into ol\ 1-cycles, a<i 2-cycles, ... and a n n-cycles. The statistical weight 
factor 7(«i, ...,a n ) is the number of ways in which n particles can be assigned to a\ ex- 
change clusters of size I, with I running from 1 to n. Figure 1 shows all possible cluster 
diagrams up to order n = 7. Each contribution diagram consists of n filled dots and the 
associated exchange lines. Reflecting the Fermi (or Bose) symmetry of the wave function, 
the exchange lines only occur in closed loops: the particles belonging to a given exchange 
cluster appear as nodes in a continuous circuit of lines that represents a transposition or 



cyclic permutation. Cluster diagrams of this type (though with additional lines representing 
dynamical correlations) are used in the description of non-interacting fermions or bosons in 



the correlated wave-function and correlated density- matrix formalisms fT9| , |20|| . 

A large number of computer experiments [EJ have established the following behavior of 
higher-order networks when applied to problems in pattern recognition. When the patterns 
to be recognized are structured rather than random, the network dynamics usually converges 
to the pattern with closest structural similarity to the initial pattern, rather than to (or to 
a state very near) the pattern having largest overlap with the initial state. This behavior 



contrasts with that of first-order networks having only binary synapses ||24|| ; relative to these 
conventional systems, higher-order networks demonstrate a greatly enhanced capability for 
structural discrimination of arbitrarily complex patterns. Moreover, when functioning in the 
regime of dilute pattern storage (i.e., far from saturation, thus p ~ N << N K , K > 2), the 
basins of attraction of the memorized patterns are dramatically enlarged. Finally, it is to 
be emphasized that in the model we have considered, the combinatoric explosion of weight 
coefficients is obviated, since the network only needs to know the overlaps of the present 
state with all the patterns to be embedded. 

This paper is a contribution to the ZiF Research Year on the Sciences of Complexity: 
From Mathematics to Complexity to a Sustainable World. The research was supported in 
part by the U.S. National Science Foundation under Grant No. PHY-9900713. 
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Figure Caption 

Fig. 1. All possible fermion cluster diagrams for n — 2, 3, 7, in the absence of dynamical 
correlations. 
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